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ON A GENERAL SOLUTION FOR THE PARAMETERS OF ANY 
FUNCTION WITH APPLICATION TO THE THEORY OF 
ORGANIC GROWTH 


By Harry SYLVESTER WILL 
Part I 


I. The Problem Stated. A type of problem which continually arises in the 
ordinary course of statistical analysis is that of determining the numerical values 
of the parameters of a function used to represent a series of observational data. 
In mathematical terminology, the problem may be stated as follows: 

Given, the observational series Yo, Yi, --- Yuu. 

Assumed, the function y = f(x, a, b,c, --- ). 

To find, the numerical values of the parameters a, b, ¢, - 

If the function f(z, a, b,c, --- ) is linear in the parameters, the desired solution 
_ is easily obtained by familiar methods. In cases where the function is not 
linear, the standard procedure is to reduce it to the linear form by expansion 
into Taylor’s series, thus: 


f(z, a, b, c) = f(x, aoboco) + fa(x, acboco)-Aa + fe(x, aoboco) - Ab 


1 
+ f.(x, aoboco) - Ac, () 


where a = dp + Aa, b = bo + Ab, c = co + Ac. 

The use of this method suffers from the excessive labor involved as the number 
of parameters to be determined increases. In cases where satisfactory values 
of the first approximations aoboco are not obtainable, the solution becomes im- 
possible. The basic difficulty arises from the consideration that the Taylor 
theorem requires that the increments Aa, Ab, Ac shall be very small quantities. 

A method of successive approximation which makes feasible the reduction of 
gross errors in the corrections will, I take it, be of considerable interest to 
mathematical statisticians. Let us, therefore, proceed to the development 
of a technique which accomplishes precisely this result. 


II. The Theta Technique. Let us begin our development with the follow- 
ing restatement of the technical problem involved: 

Given, the observational series Yo, Y1, --- Yn. 

Assumed, the function y = f(x, (ao + @Aa), (bo + @2Ab), (co + 63Ac)). 

To find, the values of 6, 42, 43. 

In this set of relations, ao, bo, co and Aa, Ab, Ac are known quantities; while 
6,, 02 and 6; are each assumed not to exceed +1 in value. It follows, therefore, 
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that the adjusted values of a, b, and c lie within the bounds ay + Aa, bo + Ab, 
Co + Ac. We may, then, write the following: 











a, = a — Aa; do = Ay + Aa. 


by 









bo _ Ab; bo a Ab. 








C, = Co — Ac; Co = Co + Ac. 












The values of 6,, 6 and 63; are determined by the following procedure: 


First, form the function y from all possible combinations of a@2, bibs, cicp, 
thus: 






Yuu = f(x, arbics). 


f(x, aybyc2). 


os’ 4 2 * eee So eS OD * 


f(a, dzbece) ‘ 


In the case of p parameters, we can evidently form 2? distinct sets of n values 
for the function yi. Since the assigned values of parameters are mere approxi- 
mations to their true values, each computed set of values for the function y;;; 
will differ from the true values y = f(a, abc). 

Second, form the theoretical residuals y;;; — y, and then compute the corre- 
sponding standard errors of estimate oi;. There will, accordingly, be 2? values 
of o determined, each value being a measure of the error committed in assuming 
the corresponding approximations to parameters; thus, o11 measures the errors 
committed in assuming the combination a,bic;; o12 measures the errors com- 
mitted in assuming a@biC2; --- ; 7222 measures the errors committed in assuming 
Abele ‘ 

Third, taking the squared reciprocal of o as a measure of the reliability of a 
given determination of y;;; from the parameters a,b,c;, we may form the follow- 
ing comparative tests of the reliability of the 2” sets of the values of y;::, thus: 


—G .¢.—9 —2 —e a —2 
om = 01312(0111 + oie + ++ +992) = 091122 


ttie 
—2 . —2 
O12 = Oriel Gis 













(3) 






















(4) 


@222 = Oa22:2, ae ee a 
Omega, we shall term the test constant. Obviously, 2w;;; = 1. 


Fourth, assuming three parameters, let us tabulate the possible subscripts of 
omega according to the following scheme: 


w(a;) w(az) w(bi) w(b2) w(¢1) w(c2) 
111 211 111 121 111 112 
121 221 211 221 211 212 
112 212 112 122 121 122 
122 222 212 222 221 222 





















ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 167 


In this table, the subscripts are in the order of abc; so that 111 denotes 
w(aybic1) ; 112 denotes w(aibic2); ete. Comparing columns w(a1) and (a2), we 
observe that the be subscripts are identical for both; while the a; subscripts of 
the first column are replaced by the az subscripts in the second column. Again, 
comparing columns w(b;) and w(be), we see that the ac subscripts are identical 
for both; while the 6; subscripts of the one column are replaced by the be sub- 
scripts in the other. Finally, comparing columns w(c;) and w(ce), we note that 
the ab scripts are identical for both; while the c; subscripts of the one column 
are replaced by ce subscripts in the other. 

Fifth, let us form the column summations Yw(a,), Lw(ae); Lw(b1), Zw(be); and 
Dw(c1), Zw(c2). Since the columns w(a1) and w(ae) differ only with respect to the 
a subscripts, the difference in value between the sums Yw(a;) and Zw(a:) can be 
due to differences in value between a, and a; only, and are not at all affected 
by differences in value between bibs and cyc2. Lw(a1) and Yw(a2) may, therefore, 
be regarded as the weights of a, and az to be used in determining the adjusted 
value of a; for 2w(a,) + Yw(az) = 1. 

We may, then, write the following relations: 


@ = Lw(ai)-a1 + Tw(az)-a2 = Lw(ai)- (ao — Aa) + Tw(a2)- (ao + Aa) 
Yw(as) + Yw(de)) -ao + (2w(ae) — Yw(ai))- Aa = a) + 6(a) - Aa. 


Since precisely similar reasoning applies to the parameters b,, bs and ¢y, ¢2, 
we have the following definitive formulas for computing the values of theta: 


6(a) Lw(ae) — Lw(as). 
6(b) = Lw(be) — Tw(b:). (6) 
6(c) Yw(c2) — Tw(er). 
As the adjusted values of parameters, we have: 
a = ayo + O(a)-Aa. 
b = bo + 0(b)- Ab. (7) 
Cc = Co + A(c)-Ac. 


In this development of the theta technique, we have determined ¢;;; from the 
theoretical residuals y;;; — y. This has served well the purposes of exposition; 
but, since the true values of the function y are unknown, we must, in practice, 
compute o;;; from the observational residuals y;;; — Y. Later in the memoir, 
it will be shown how the computation of @ may, in numerous cases, be con- 
siderably abridged. 


Part II 


III. The Principle of Malthus. Since a determination of the numerical 
parameters of a given function by means of the theta technique must, at best, 
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involve a considerable amount of computation, I have chosen for purposes of 
demonstration a problem which is of much interest in itself. This problem, we 
shall state in the form of two questions: 

First, what is the most appropriate mathematical form of the law of organic 
growth? 

Second, how may the parameters of the indicated function be computed? 

Thomas R. Malthus, in his famous essay on The Principle of Population Growth 
assumed that the proportional growth of human populations is properly de and 
by the differential equation, 

1 dp 


ee (8) 


where 7 is the population under consideration, ¢ is the measure of time, and 6 is 
the stable or geometric rate of growth. 

This formula has been destructively criticised on the ground that it fails 
wholly to give a mathematical description of the manner in which population 
growth is kept within bounds. So far as any implication of the formula is 
concerned, populations may grow to infinite magnitudes. An attempt to 
represent growth by its use must, therefore, result in a succession of discontinui- 
ties which are incompatible with the observed facts of organic growth. 









IV. The Symmetric Logistic. In three memoirs published in 1838, 1845 
and 1847, it was suggested by M. Verhulst, Professor of Mathematics in the 
Ecole Militaire in Brussels, that the rate of population growth might be stated 
as a function of the population itself. Assuming the limiting value of p to be 
H, this conception of the growth rate Verhulst expressed by the differential 
equation, 










~ ioe om OCD gl), (9) 








Since this equation expresses proportional growth as a linear function of p, 
it is the simplest relation of its kind that may be conceived. In representing the 
rate of growth as a quantity which approaches zero as the population approaches 
its limiting value, it makes, indeed, a significant advance over the Malthusian 
formula. Nevertheless, the equation is subject to an interesting limitation, 
the nature of which is made evident by an examination of the integral form of the 
function, namely: 


= A:[1 + 4). (10) 


This we shall now prove to be rotationally symmetric with respect to the point 
of inflection. 
Differentiating equation (9) a second time, we have, 


dp = —bdp\p(1 — H~'p)|dt 
= plp*dp® — b dt + bHp dt + bH-'dp di] 
p ‘dp? + bH~p dp dt. 
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Hence, 


dy 


if = b2p(1 — H—p)? — BH p*(1 — H-p). 


ae 
Setting ae = 0, we get, 


1 — 2H"p = 0. 
Or 


p = H/2, 


which gives the value of p at the point of inflection. 
Substituting for p from (10), and solving for t, we have, 


t; = —a/b, (12) 
where ¢; is the point of inflection of the function p. 
Denoting the magnitude of the population at time ¢; by p;, its magnitude at 
time ¢;,, by pis, and its magnitude at the time ¢;_; by pi_z, we have, 


p= All + ero] = H/2. (13) 
Di+t = H:{1 4 ert b(t+kAt) | — H:{1 + bk At}, (14) 
Di-k = H:[1 + ertb(t—kAt)] = H:[1 + e-bkAt] (15) 


Measuring p in units of H and setting u = e*4t, we may rewrite these last 
three equations as follows: 


H-p; = 1/2. 
A pis, = 1:({1 + yu]. 
Hp; = 1:{1 + w"). 
On the hypothesis of rotational symmetry, we have, by subtraction, 
Hpisx — 1/2 = 1/2 — Hpi. 
In proof, we have: 
1:[1l + uJ =1—1:{1 + u"] 
=ut:[l+u-4 
l:[u + 1). 


q.e. d. 


Part III 


V. Criticisms of the Logistic. Because of its symmetric form, many critics 
have called into question the finality of the logistic as a universal repre- 
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sentation of population growth. That it applies in particular cases, they con- 
tend, is no reason for holding that it must apply in general. Professors Raymond 
Pearl and Lowell J. Reed of Johns Hopkins University—to whom we are in- 
debted for the rediscovery of the earlier researches of Verhulst—have proposed, 
as the proper form of the generalized growth curve, the following function: 





p= H:{i + eatbt+et+dt*) , (16) 


In their view, this equation is suited not only to representing a single cycle of 
growth, but two successive cycles as well. This claim, however, must be 
rejected; for, if true, it would mean that one cycle of growth is predictable from 
another, a circumstance which is clearly inconsistent with the assumptions laid 
down by these same investigators. 

Moreover, so far as I can learn from their published writings, these authors 
have never considered the implications of the differential form of the function 
they propose. 

Differentiating (16), we have, 

: : dp = — (b+ 2cr + 3dz*) (1 — Ap). 

p dt 

Here, we find the stable growth constant of Malthus replaced by an expression 
which is quadratic in ¢. This means that, for a population which is freed of a 
restraining limit, proportional growth tends generally toward infinite values. 
If there are any facts to support such a conception of organic growth, I do not 
know what they are, and must, perforce, reject the contention that equation 
(16) is the generalized form of the Verhulst function. 


VI. Fundamental Assumptions. In order to represent the phenomenon 
of population growth mathematically, I hold the following assumptions to be 
necessary : 

(a) Under favoring conditions, population may increase at a constant geo- 
metric rate. 

(b) Under all circumstances, the rate of growth must be a finite and continuous 
quantity. . 

(c) The magnitude of a population is always a positive, real number. 

(d) The growth of population tends toward restriction within definite bounds. 

(e) The growth of population is a function of time. 

(f) The basic conditions of growth are free of cataclysmic disturbances. 

The first of these assumptions is given in recognition of well known facts 
concerning organic growth. The second is necessary because, even when the 
size of a population is freed of definite restriction, the pattern of growth is not 
necessarily geometric. The third assumption affirms the absurdity of represent- 
ing a population as a negative or infinite quantity. The fourth merely asserts 
the indisputable fact that the organism must always grow in a finite environment. 
The fifth gives place to the concept of growth as the resultant of a complex of 
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causes, no one of which can be isolated as an entirely independent variable. 
While the final assumption recognizes that. major disturbing influences may 
profoundly affect the course of growth. 


VII. The Skew Logistic. In accord with our fundamental assumptions, we 
may form the following differential equations: 


= — [b+ sm-cos (m(t + q))| [1 — Hp’) © Type a 


— [b + 2sm*(t + gq): (1 + mt + g)*)] [1 — Hp’] Type 6 (17) 
— [b+ smX(t + g) 2714 m(t+q?]1[1 — H“p’] Type y 


In these equations, p’ = p — L, and measures p from its lower limit as origin. 
On separating variables, the following integrations may be performed: 


- | tao’: — H~'p’))| = — log [p’:(1 — H™p’)] = log [(H — p’):(Hp’)]. 


Writing z = m(t + q), dz = mdt; so that we have: 


b [dts | cosede= A+ b+ 5-sinz 
b [e+ 26 [tec + 2)|dz = A+ bt+s-log (1 + 2). 


b [ts | levi Pali = Ate tsvize 


From these integrals, we form the following equations: 
log ((H — p’):(Hp')| = A + bt + s-sin[m(t + q)] 
log[(H — p’):(Hp’)] = A + bt + s-log[1 + mt + 9). 
log ((H — p’):(Hp’)] = A+ bt +8-V71 + mt + 


We have, finally, on taking antilogarithms and making the substitutions 
p=p+La=A —logH: 


p= Zz. + H;[1 + eatbi+s- sin(m(t+q))| Type a 
p= Zz. + H:[1 + ertbt+s-log(t+m*(t+q)?)] Type B (18) 
p=L+A:[(1 + eatbttsy/1+mit+a)?] Type y 
These equations give the normal forms of the skew logistic. 
VIII. Properties of the Skew Logistic. We may deduce the properties of 


the skew logistic by examining both its differential and integral forms. Con- 
sidering the derivative of Type a, we note that the Malthusian constant b is 
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replaced by a trigonometric function whose amplitude is b + sm, and whose 
phase depends on the values of m and g. When b + sm = 0, the derivative 
must also equal zero, and a flat point in the curve of p is indicated. When b is 
absolutely less than sm, the derivative changes sign and the curve of p reverses 
its direction. Thus, the integral form of Type a modifies the symmetric form 
of the logistic by a succession of minor cycles in which the rate of growth is 
alternately accelerated and retarded. 

Considering Type 8, we find the Malthusian constant replaced by a function 
whose maximum and minimum values are attained when t = m—! — q. Obvi- 
ously, therefore, this function passes through a single period whose amplitude 
is b + sm, and whose phases are b, b + sm, b,b — sm,b. When b + sm = 0, 
a flat point in the curve of p is generated. The effect of skewness on the rate of 
growth passes through two double phases. Where b and s are of the same sign, 
these phases are: first, increasing retardation followed by decreasing retardation 
when ¢t + q is negative; and, second, increasing acceleration followed by decreas- 
ing acceleration when t + q is positive. Where b and s are of opposite sign, 
the corresponding phases are: first, increasing acceleration followed by decreas- 
ing acceleration when ¢ + q is negative; and, second, increasing retardation 
followed by decreasing retardation when t + q is positive. It is to be noted 
that, when sm is absolutely greater than b, the derivative will change sign twice 
before the upper limit is reached. Under these circumstances, the function p 
passes through a double reversal of direction. 

Considering Type y, we find the Malthusian constant of the derivative re- 
placed by a function which is aperiodic and which approaches the limits b + sm 
as tapproaches + «©. When band sare of the same sign, skewness passes through 
the two following phases: first, the phase of decreasing retardation when t + q 
is negative; and, second, the phase of increasing acceleration when t + q is 
positive. On the other hand, when b and s are of opposite sign, the correspond- 
ing phases are: first, that of decreasing acceleration when ¢t + q is negative; and, 
second, that of increasing retardation when ¢t + q is positive. When sm is 
absolutely greater than b, the derivative changes sign, and the function p passes 
from a continuously increasing phase to a continuously decreasing phase, or 
vice versa. 

In general, it may be said of all three types—a, 8 and y—that, if the derivative 
is not restricted to a single change of sign, L denotes a lower asymptote of the 
function p; while, under the same conditions, H denotes the higher limit ap- 
proached by the function p — L. When H is negative, the effect is to make L 
an upper, and L — H a lower, asymptote of the curve p. 

In the case of Type y, when the function p makes a single change of sign, 
either H or L becomes a maximum (or minimum) value instead of an asymptote 
of the curve. In this event, it will be noted that the factor 1 — H~'p appearing 
in the derivative does not approach zero as a limit with increasing values of f, 
but rather passes through a minimum and then approaches the limit 1 in either 
direction. 
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The parameter s may be positive or negative in sign, and is termed the index 
of skewness or, briefly, the skewness of the function. Obviously, m is always 
positive, and, since it determines the rate at which skewness develops, is properly 
termed the development. The point in time at which skewness passes from an 
accelerating to a retarding phase, or vice versa, is fixed by the value of q, which is, 
therefore, termed the transition. The parameter b, as has already been stated, 
is termed the stable growth tendency or, technically, the stability of the function. 
And since the position of the curve p on an arbitrary time scale will vary with 
the value of a, this parameter I have designated the location. 

In all three types of the skew logistic, if e¥ is a continuously decreasing func- 
tion and both H and L are positive, the curve of p may be described as of the 
rising hillside form. In the case of Type y, if the derivative changes from 
positive to negative sign, the curve may be described as mountain formed. If 
e¥ inereases continuously, the curve is of the falling hillside variety, except 
when the derivative of Type y changes from negative to positive sign, in which 
event a valley form is generated. 


Part iV 


IX. Parameters of the Symmetric Logistic. The numerical parameters of the 


symmetric logistic (10) are most easily determined by the method of differences. 
First, we write, 


pi = C + et, (19) 


where C = HH; A = a — log H; andi = 0,1,2,---n— 1. 
Assuming At constant, let us give to ¢ the increment kAt, thus: 


Disk = C + eAtb(ttkar) (20) 
Subtracting (19) from (20), we obtain 


A. Pi = eAtb(t+ks:) _ gAtbt — Best, (21) 


where B = c4* — 1. The quantity A,p;! = p71, — p; is termed a first order 
difference of rank k. 
Giving to ¢ in equation (21) the increment kAt, we get 
An Disk = BeArtvd(t+kat (22) 
Dividing (22) by (21), we have, 
A, Dit t Appi = eS, 
Taking logarithms, we obtain 
A, log A, p;'! = log A, pi. — log A, p;! = bkAt, 


which defines the parameter b. We can form n — 2k such equations. 
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b is uniquely determined by the relation 
b = [> 0i=5-*" a, log A, P;"]: [k(n — 2k) At] 
= [> i=r* log A, Pj! — oizt** log A, P;']: [k(n + 2k) At], (23) 


where k = n:3 to the nearest integer. 
Returning to (21), we have the following relation determining the value of A: 


A = log [20i=)* "A, P;'] — log [B i “7 
= leg (Sis? — Sirs Ps — es (Di, 


where k = n:2 to the nearest integer. 
From equation (19), we have 


= [Disot Pst — et Lazo een. (25) 


The values of H and a are, obviously, given by 


I 


H=C". (26) 
a=A-+logH. (27) 


In the relations defining b, A and C, the values of P must be obtained from 
the observations. In computing the values of k, the formula is: 


k= n(r+1)-, 


where n is the number of observations, and r denotes the order of reduction 
involved in the defining relation. 

In my first treatment of the subject, I assumed that the value of k for all 
orders of reduction might be determined from the reduction of highest order 
involved; but I have since found that I erred in this view. The point is that the 
function ¥(p) = k’(n — rk), discussed in the original memoir, must be maxi- 
mized with respect to k separately for each order of difference involved; or, in 
other words, the rank constant k must be given a separate determination for 
each parameter defined if the most accurate results are to be obtained. 


X. Parameters of the Skew Logistic. I shall now show how the method 
of differences may be used to abridge the computations involved in applying 
the theta technique to the determination of the parameters of the skew logistic. 
In this, as in the preceding section, we assume At constant. 

Operating on Type y of equation (18), we write 
pP, =L+ H:[1 4 ertbtts Vi+mit+a)?| (28) 


To begin with, let us write the transformation of ordinate 


G = log[H(p — L) — 1]. 


Also, let us write 


F = V/1 + mt + q)?. 
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We may now rewrite equation (28) in the form 
G; =a-+ bt + sF;. 
Giving to ¢ the increment kAt, we have 
Gizk = a + O(t + kA) + SFinx. 
Subtracting (29) from (30), we have, 
AiG; = bkAt + sA;F;. 
Again giving to ¢ the increment kAt, we obtain 
AiGisx = DkA(t + kAt) + sArF i4:. 
Subtracting (31) from (32), we obtain 
AcGixxn — AcG; = (bkAt — bkAt) + s8(Ar Fixx — AxF3), 


A2G, = sA2F,. (33) 


We can form n — 2k such equations, and may, therefore, form n — 2k ap- 
proximations to the value of the parameter s, as follows: 


s, = [A2GJ:[42F,]; i=0,1,---,n — 2k —1. 


Taking the mean value of the set s; as its most probable value, we have, 


8o(HL-mq) = Ys;:(n — 2k); k = n:3 to the nearest integer (34) 


In this determination of so, the only parameters directly involved are H, L, m 
and qg, the parameters a and b having been eliminated. By assigning values to 
Ho, Lo, mo and qo, We may, on setting up the arbitrary corrections AH, AL, Am 
and Aq, write down the following: 


m = mM — Am; Me. = Mo + Am; Ma = qd — AQ; g2 = Go + Ag. 


Since so is a function of H, L, m and q, we may, by entering the subscripts of 
the combination HL-mq, tabulate the possible determinations of so as follows: 


11-11 11-12 11-21 11-22 
12-11 12-12 12-21 12-22 
21-11 21-12 21-21 21-22 
22-11 22-12 22-21 22-22 


In this tabulation, the subscripts of parameters are in the order of HL-mq; 
so that 12-21 denotes so(H,Le-meq;), ete. 

From the table, it is seen that we may compute 2¢ = 16 distinct sets of approxi- 
mations to so(HL-mq). Since the true values of H, L, m and q are unknown, 
each set of approximations s; will show a characteristic variation about its mean 
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value, so. This variation is most conveniently measured by the mean deviation 
’ _ ” - 
ee = (sp —_ So )2N P : N = (so _ S))2N si . N, (35) 


where the second relation serves as a check on the computation by the first; 
N = n — 2k; N’ denotes the number of items s; which are less than so in value, 
and N’”’, the number of items s; which are greater than so in value; while s, denotes 
the mean of the N’ values of s; which are less than so, and sy, the mean of the 
N” values of s; which are greater than so. 

The reliability of a given value of so as a measure of the central tendency of 
the corresponding set s; is sufficiently determined by e~?, which serves at the 
same time to measure the reliability of the combination HLmgq figuring in the 
computation of the given set s;. We may, therefore, compute the values of the 
test constant, w, directly from the values of e~? by means of the relation, 


w( HL -mq) = 6p Aeiia -+ rae + = of ést-aal = ae, (36) 


where 7 = 11-11, 11-12, --- , 22-22; Sw = 1. 
Since four values of theta are to be determined, we must arrange the sixteen 
values of omega in four ways, as shown by the following tabulation of subscripts: 





w(H;) w(H2) w(L;) w(Le) wm) w(me) w(q:) w(q2) 
11-11 21-11 11-11 12-11 11-11 11-21 11-11 11-12 
11-12 21-12 11-12 12-12 11-12 11-22 11-21 11-22 
11-21 21-21 11-21 12-21 12-11 12-21 12-11 12-12 
11-22 21-22 11:22 12-22 12-12 12-22 12-21 12-22 
12-11 22-11 21-11 22-11 21-11 21-21 21-11 21-12 
12-12 22-12 21-12 22-12 21-12 21-22 21-21 21-22 
12-21 22-21 21-21 22-21 22-11 22-21 22-11 22-12 
12-22 22-22 21- 22-22 22-22 22-21 22-22 








Knowing the values of omega, we have at once, 


0(H) = Sw(H2) — Sw(H;);  — @(L) = Zw(Le2) — Z0(L;); 


6(m) = Lw(me) — Tw(m); 6(q) = Lwl(qe) — Tw(qi). (37) 
H = Hy + 0(A)-AH; L = Ly) + OL)-AL; 
m = mM + O(m)-Am; q = qo + O(q) -Ag. (38) 


The process of adjustment should be repeated until errors in the parameters 
diminish to negligible proportions. 

With H, L, m and q known to a sufficient approximation, we may form anew 
the functions G(H, L, m, q) and F(H, L, m, gq). We can then write n — 2k 
equations of form (33), viz.: 


A?G, = sAjF,. 
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Summing these equations, we have, 
LAG, = srAjF,, 
where DAG; = Doi=37'G; — QUt=E GE; + Mi=y 4G, ; 
BAF, = Diss F, — QL, + DISA, ; 


where k = n:3 to the nearest integer. 
The approximate value of s is now obtained from the relation 


s = [Dita AiG): (ize AEF]. 
Returning to equation (31), we solve for bkAt, obtaining, 
bkAt = A.G; _— sA.F;. 


Since we can form n — k such equations, the approximate value of b is given 
by the relation 


= (Sr "G Sipe ‘g,) — ( Fir Fr, — Si=3**F,)]: [kG — Bad, 


where k = n:2 to the nearest integer. 
From equation (29), we obtain the approximate value of a as follows: 


aw (Fis *6, ~ 62 fos 44 ~ 02 us Fae. (42) 


Comparing the abridged method of computing the values of theta here out- 
lined with the general procedure of section II, it will be seen that we have been 
able to reduce the number of values of omega which it is necessary to determine 
from 2? = 128 to 24 = 16. In cases where L may be assumed to equal zero, 


the number of values of omega which must be computed is further reduced to 
2° = 8. 


Part V 


XI. Symmetric Parameters for the Population of the United States. I 
have determined the numerical values of the parameters of both the sym- 
metric and the skew forms of the logistic from the population figures for the 
United States given by the Bureau of the Census. The only departure in the 
data from the census figures consists in the interpolation of all items to June 
Ist as the date of observation. The values of the symmetric parameters are 
computed from the data of Table I, as follows. 

Setting k = 15 + 3 = 5, we have, by equation (23), 


Ds4; log AsP;? = Y0§ log AsP;* — D2 log AP; 
= 9.71878n — 5.14555n = —3.42677 
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TABLE I 
Data for the Symmetric Logistic 
p- AsP-} log AsP—! 


. 25582 —0.19724 29500,, 
. 18939 —0.14627 | .16516,, 
. 13885 —0.10704 .02955,, 
10431, —0.07838 .89421,, 
07770 —0.05776 . 76163, 
05858 —0.04269 .63033, 
04312 —0.02996 .47654,, 
03181 —0.02098 .32181, 
.02593 —0.01650 21748,, 
.01994 —0.01182 07262,, 
.01589 
.01316 
.01083 
.00943 
.00812 
.00288 — 0.70864 . 86433, 


OCONDOaAhrWNeH SOS 
DOE DO! HO! DO! BO! DO! DOE Fe 


moqceocqcoeoeoceocnceoqocooco & 


TABLE II(a) 
Data for the Skew Logistic 
Gy Ge Fy Fy Fy Fr 


.67998 .71132 6.47765 .39261 65194 . 90306 
. 54968 .57779 ~=—- 55. 68859 . 60000 45931 . 73631 
.40690 .43878 4.90306 . 88680 26911 . 60000 
. 27698 .380927) = 4.12311 . 28062 .08276 . 96205 
. 14130 .17416 3.35261 . 00000 90306 .00000 
.00816 .04179 2.60000 28062 . 73631 . 96205 
.85948 .89428 1.88680 88680 . 60000 . 60000 
. 70540 .74193 1.28062 . 60000 . 56205 . 73631 
. 59699 .63515 1.00000 .39261 .00000 . 90306 
44841 48956 1.28062 .12311 . 96205 08276 
30840 35346 1.88680 . 90306 . 60000 26911 
17992 22981 2.60000 . 68859 . 73631 45931 
.02885 .08647 3.35261 .47765 . 90306 65194 
.09590 .02968 4.12311 .26911 .08276 10.84620 
— 0.25808 — 0.17670 4.90306 .06226 7.26911 12.04159 
+10.83647 +11.47739 49.45864 55.76384 71.41783 80.95375 


CONoOoRWNHEO ™ 


b+4+4++4+4+4+4+4+4+4+4++4 
eooooooorrRrErEeEr 
}+t+++++4+4+4+4+4+++ 
cocoeooooeorR RRR rE Fe 
ORWN HEHE EN WE ONO © 

CONARWNHRE REED WP 


ONO RP WN HR RE Re Dd 
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TABLE II(s) 
Data for the Skew Logistic 
As A5Ge ASFu AS Fiz A5 Far A5 Fee 
—0.02794  —0.01880 3.16445 5.69443 4.77932 9.04807 
+0.01064 +0.01904 .51499 4.51499 6.99562 6.99562 
+0.02495 +0.04139 .69443 3.16445 9.04807 4.77932 
—0.01290 +0.00929 .24622 1.84451 10.16552 2.60213 
—0.01360 +0.01834 .69443 0.81604 9.04807 0.87607 
—0.01885 +0.06926 25.31452 16.03442 40.03660 24.30121 


We note that k(n — 2k)At = 5(15-10)1 = 25; hence, 


b = —3.42677 + 25 = —0.1370708. 


Next, set k = 15 + 2 = 7, to the nearest integer; then, by equation (24), we 
get 


0A, Pi! = oP; — >5j P;' = 0.10330 — 0.86777 = —0.76447; 
B = 104t — 1 = 10-0.1870708x7 __ 1 — —_0,89022; > 3 10" = 3.39884. 
Hence, 


A = log [—0.76447] — log [—0.89022 X 3.39884] = 1.4025324. 


We have next 


DP; = 1.00288; 553410" = 3.66216; 10% = 0.25266. 


By equation (25), then, we obtain 


C = [1.00288 — 0.25266 X 3.66216] + 15 = 0.0051747. 
By equation (26), we get 
H = C-! = 193.25. 
Finally, by equation (27), we obtain - 
a= A+ log H = 1.4025324 + 2.2861136 = 1.68865. 


The point of inflection of the curve is given by 


t; = —a:b = 1.68865 + 0.1370708 = 12.319. 


XII. Skew Parameters for the Population of the United States. Assuming 
L = 0, we form 
H, = 198.0 — 7. 191.0; H. 198.0 + 7. 205.0. 
= 1.0-—0.2 8; me = 1.04 0.2 2. 
= —§.0 — 2. —8.0; qz = —6.0 4 2. —4.0. 
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in Table III(a). 
tions (34), (35) and (36). 

In Table III(b), the several values of w are arranged according to their associa- 
tion: first, with H,, He; second, with m, me; and, third, with q, q. 
sums yield the weights Zw. 
eters are computed by equations (37) and (38): 
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TABLE III(a) 


Data for the Computation of ® 










































































Next, the primary data of Tables II(a) and II(b) are computed. 
k = 15 + 5 = 3, — k values of the 2° sets of s; are determined and entered 
The values of so, € and w for each set are computed by equa- 








Setting 


The column 


The values of 6 and the adjusted values of param- 









































z s(1.11) s(1.12) s(1.21) s(1.22) s(2.11) s(2.12) s(2.21) s(2.22) 

0 | —0.00883) —0.00491| —0.00585| —0.00309) —0.00594; —0.00330| —0.00393|—0.00208 
1 | +0.00236| +-0.00236) +0.00152} +0.00152) +0 .00422| +0.00422| +0.00272| +0 .00272 
2 | +0.00438) +0.00788| +0.00276) +0.00522| +0.00727) +0.01308) +0 .00457|+0.00866 
3 | —0.00207) —0.00699} —0.00127) —0.00496| +0.00149) +0.00504| +0.00091|+0.00357 
4 | —0.00239| —0.01667; —0.00150) —0.01552) +0 .00322| +0.02247| +0.00203) +0 .02093 

| | - | sais sa oi ‘as 7 ‘ ” } sl 

= | —0.00655| —0.01832) —0.00434| —0.01683| +0.01026| +0.04151) +0 .00630| +0 .03380 
so | —0.00131| —0.00366| —0.00087| —0.00337| +0.00205| +0.00830| +0.00126|+0.00676 
« | +0.00374! +0.00703| +0.00241) +0.00550| +0.00342) +0.00404} +0 .00222) +0 00643 
w | +0.10624) +0.03012) +0.25711| +0.04919} +0.12708| +0.09137) +0.00290| +0.03061 

















TABLE III(s) 
Data for the Computation of ® 

































| woth) (he) | atom) wm) | (a) | wae) 
| | : . | 
0.1062 0.1271 0.1062 0.2571 0.1062 0.0301 
0.0301 0.0914 0.0301 0.0492 0.2571 0.0492 
0.2571 0.3029 0.1271 | 0 .3029 0.1271 0.0914 
| 0.0492 0.0360 | 0.0914 | 0.0360 0.3029 0.0360 
7 | | - | 
= | 0.4426 | 0.5574 | 0.3548 | 0.6452 | 0.7933 | 0.2067 
TABLE IV(a) 
Summary of Adjustments 
Parameter anated A 0 A:0 sapiens 
Value Value 
















H +198.0 +7.0 +0.1148 +0.8036 +198.80 
m + 1.0 40.2 +0.2904 +0.05808 +1.05808 
q —~ 6.0 42.0 —0.5866 —1.1732 —7.1732 
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TABLE IV(s) 
Final Transformations 

G(Hmgq) F(Hmq) 
.69772 7.65559 
.56410 6.60800 
.42495 5.46440 
. 29526 4.52752 
.15991 3.50336 
.02722 2.50753 
87921 1.59408 
.72613 1.01784 
.61866 1.32865 
.47182 2.17626 
.33408 3.15374 
. 20842 4.17075 
.06189 5.20418 
. 94223 6.24580 
.78913 7.29229 
. 20073 62 .54999 


=e ooocroodcdcr & = = = 


—_ 


Finally, the functions G and F are formed anew from the adjusted values of 
H, m,q. The adjusted values of s, b and a are computed by equations (40), 
(41) and (42), as follows: 


s = (5 ie, — 3326, + 2 3e)-fF ier, — 2337, + 3 iF 2 
= [0.33574 — 2 X 3.72304 + 7.14194] + [26.06676 — 2 X 8.62436 
+ 27.85887] 


0.03161 + 36.67691 = 0.00086185. 

= (os‘G@; — 0G, — sQue*F; — DoF )|:[k(m — k)Atl 

= [1.42623 — 9.04837 — 0.00086185(29.57167 — 31.96048)] + [7(15 — 7)1] 

= [—7.62214 — 0.00086185 x (—2.38881)] + [56] = —0.13607. 
a = (9 3G, — 62.3% — 02 5 ‘Fin 

= [11.20073 — (—0.13607 x 105) — 0.00086185 xX 62.54999] + 15 

= 1.69561. 

In the present case, the values of Ho, mo and go were known within definite 
limits from previous experimentation. The values of the corrections, 6-A, 
were, on this account, smaller than should ordinarily be expected from a first 
application of the technique. Always, it is necessary to take A sufficiently 
large to insure @< 1. Asa preliminary step, it is not infrequently advantageous 


to compute trial values of « by holding constant each two of the parameters 
Hy, mo and qo while experimenting roughly with the third. 
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Year 
1940 
1950 
1960 
1970 
1980 
1990 
2000 
2010 
2020 
2030 
2040 
2050 
2060 
2070 
2080 
2090 
2100 
2110 
2120 











_— Census Symmetric Percentage 
Count Ordinates Deviations 
1790 3.909 3.88 —0.78 
1800 5.280 5.28 —0.03 
1810 7.202 7.16 —0.52 
1820 9.587 9.69 +1.07 
1830 12.866 13 .04 +1.37 
1840 17.069 17.45 +2.22 
1850 23.192 23.15 —(.20 
1860 31.443 30.38 —3.36 
1870 38 .558 39.36 +2.09 
1880 50.156 50.18 +0.05 
1890 62.948 62.61 —0.31 
1900 75.995 76.79 +1.05 
1910 92.329 91.76 —0.62 
1920 106.001 106.96 +0.90 
1930 123 .068 121.66 —1.14 
TABLE V(b) 
Extrapolations 
Forecast Sym. O. Sk. O. 
137 .20 135.22 136.26 
149.29 147.18 148.78 
159.88 157 .33 159 . 52 
168.71 165.66 168 .42 
175.83 172 .33 175.59 
181.46 177.52 181.25 
185.82 181.52 185.63 
189.14 184.55 188 .98 
193.11 186.82 192.97 
198 . 54 188 .52 193 .40 
194.94 189.77 194.83 
195.98 190.72 195.87 
196.75 191.39 ° 196.64 
197.31 191.88 197.22 
197.73 192.25 197 .64 
198 .03 192.52 197 .94 
198 .25 198.17 
198 .42 198 .34 
198 . 54 198 .46 


198. 
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TABLE V(a) 
Ordinates of Fitted Curves 








198. 


~~ 


vO 


Skew 
Ordinates 


Year 
1780 
1770 
1760 
1750 
1740 
1730 
1720 
1710 
1700 
1690 
1680 
1620 
1610 
1600 
1590 


87 
27 
15 
.67 
3.02 
.42 
13 
OF 
31 
07 
.60 
.64 
42 
.16 
.23 


Sy 


eeeeeSeeoqonoeorrewn N 


Percentage 
Deviations 
—O. 
—0. 
—0. 
+0. 
.20 
.09 
.28 


m. O. 


.844 
.083 
.923 
.113 
.813 
.594 
.434 
.316 


231 
168 
123 
090 
.065 
.048 
.035 


01 
25 
73 
88 
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Part VI 


XIII. General Considerations. The technique of solution for the numeri- 
cal values of parameters presented in the foregoing pages is generally applicable 
to continuous functions of real variables. The abridged procedure may be 
followed whenever the given function involves a component which is linear in 
certain of the parameters: for, in such cases, it is always possible to effect a 
transformation of ordinates which will permit of the elimination of the param- 
eters of the linear component. In any event, the equation of the function 
may be solved for a single parameter which may then be employed, as in our 
illustration, as a means of determining the values of the test constant, omega. 


XIV. An Interpretation of Results. The equations of the symmetric and 
skew logistic curves as computed for the population of the United States are, 
written to the natural base, as follows: 


p= 193.25:[1 +. ¢3.88826—0.31562¢] | 
p = 198.80:[1 + ¢3-90429-0.81331¢+0.0019845 V1+1.05812((—7.1782)?] 


The amount of skewness in the second of these equations, as measured by 
the value of s, is small; but, owing to the fair size of the parameter m, it de- 
velops rapidly and affects the form of the curve sensibly. The major effect 
is to raise the value of the limiting population as given in the first equation by 
about six millions and to prolong the period of growth by about forty years. 
The approximate limit of 193 millions in the symmetric form is reached about 
the year 2090; while the approximate limit of 199 millions of the skew form 
is not arrived at until about the year 2130. 

The positive sign of s makes for a decreasing acceleration of the rate of in- 
crease during the earlier phases of growth and for an increasing retardation of 
this rate during the later phases, the value of q fixing the point of transition 
in the year 1861. This general epoch has often been cited by sociologists as 
marking the shift from a dominantly rural-agricultural civilization to a domi- 
nantly urban-industrial one. The point at which the change takes place has, 
to my knowledge, never before been defined mathematically. 

Both curves fit the observations excellently, as shown by the percentage 
deviations of Table V(a). The forecasted growth presented in Table V(b) 
is based on the skew ordinates, the formula being 


P,= pi Pis/prs)'!™ , (43) 


where P denotes the actual population series, observed or predicted, and p, 
the skew ordinates. The assumptions of the formula are two: first, that it 
is the observed population P;; which initiates the forecasted series; and, second, 
that the influence of the correction factor Pi4/py diminishes with the time. 
The extrapolations of both the skew and symmetric formulas contrast with 
the results obtained by Doctors Dublin and Lotka, who predict a stationary 
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population of 150 millions by 1970. For the same year, the ordinates of both 
the skew and symmetric curves exceed this figure, the one by 15.66, and the 
other by 18.42 millions. 

The limit of 150 millions referred to was arrived at by analysis of current 
tendencies in birth and death rates. The argument is that current birth rates 
are spuriously high and current death rates spuriously low because of the 
abnormally high proportion of men and women in the reproductive ages. This 
circumstance is due, in part, to the influx in the past of immigrants from com- 
munities having a high normal birth rate, and, in part, to the high birth rates 
of preceding generations of parents in this country. 

After computation of the necessary corrections has been made, the true rate 
of natural increase of the white population for the registration area of the United 
States for the year 1920 is seen to be only about 5.4 per thousand instead of the 
10.7 per thousand indicated by the crude rates. For the year 1930, the actual 
rate of increase is 7.5 per thousand; while the corrected or true rate turns out 
to be virtually zero. Under the interpretation of the authorities cited, the 
spurious excess of births over deaths will be entirely dissipated by the year 
1970, with the result of the stationary population predicted. 

The hazard peculiar to this method of inference arises from two assumptions 
that are made: first, that the present collection and registration of vital statis- 
tics is sufficiently reliable to make precise estimate of the true rate of natural 
increase possible; second, that the tendencies of fecundity and mortality ex- 
hibited by current data are stable. 

With respect to the first assumption, the authors have this to say: 

“One factor of safety of unknown magnitude remains. There is still some 
degree of laxity in the registration of births, and the figures of the true rate of 
natural increase may, on that account, be somewhat larger than recorded 
above.” 

The caution of the authors in this statement is in contrast with the uncritical 
acceptance of their results by those who fail to grasp the implications of 
technique. 

Concerning the second assumption, it may be pointed out that many of the 
tendencies exhibited by current data must be regarded as statistically re- 
versible. Falling birth rates due to drift of population to cities, to postpone- 
ment of marriage on the part of professional classes, to the increasing cost of 
child culture, to the ubanization of rural life and to the restriction of immigra- 
tion may be definitely altered by reversals in tendency. The flow of popu- 
lation may move into extraurban and subrural districts, where birth rates are 
more favorable to increase. The cost of child culture may, in part, be socially 
assumed. Improvement in economic conditions may lessen the drain on the 
resources of the family. The tendency for rural birth rates to fall may be 
checked. Immigration may increase with improving economic conditions. 
Death rates may be further reduced in many age classes and for many causes. 
In fine, when we attempt to project into the future the components that 





ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


TT 


a 


HE 


Ear 


Re rier tate tae 
Soa eee, SEE 
Tames Fae 


= observations 
——. = skew ordinates 


Pie 


a 








186 : HARRY SYLVESTER WILL 





determine the trend of natural increase, we encounter risks which vastly exceed 
those involved in the projection of the population series itself. Most of the 
data from which component trends must be determined cover but a brief period 
of time; while population data extends back for a century and a half. In this 
connection, it is not impertinent to inquire the criterion of relevance that will 
warrant a rejection of the items of the very series we are seeking to forecast. 

It is a cardinal principle of logistic theory that the growth of population 
depends primarily on the continued supply of basic resources, physical and 
social, and that the dissipation of these resources is registered in the growth 
rate of the population itself. Any tendency of a population series toward 
skewness, that is, toward departure from the symmetric type of growth, is 
more likely to persist if it is systematic in character. The skew forms of the 
logistic function which we have developed permit us to measure any existing 
systematic tendency of the data toward skewness, and, therefore, to improve 
on the symmetric expectation of future growth. 

In the case of the United States population, the evidence of skewness, insofar 
as it bears on the problem of expectation, is adverse to the conclusion that the 
ultimate limit of growth will be less than the symmetric asymptote. Conceding 
the light that the analysis of current tendencies may throw on the probable 
occurrence of future deviations from trend, the best criterion of long-time 
growth remains the logistic projection. 

This statement, to be sure, does not relieve us of the necessity for recognizing 
the nature of the hazard that inhéres in making a prediction from a trend 
extrapolation. The hazard involved in this type of inference arises from the 
assumption that the basic conditions of growth are stable, or, in other words, 
that the values of the parameters of the forecasting formula will remain sub- 
stantially unchanged with the inclusion of new observations. Time alone can 
provide the final test of the continued validity of this assumption. 























XV. The Law of Organic Growth. The law of organic growth in its most 
general form may be written: 


pu L + Bil 4 oO rwtonte), (44) 
where u; = sin[m(t + q)]; we = log{l + m@(t + q)"]; us = V1 + mt + q)?. 


For most practical purposes, the evaluation of thirteen parameters is out of 
the question; hence, the restricted forms a, 6, and y, equation (18), will be the 
ones most generally employed. 

I have made use of the term law of organic growth with reference to the logistic 
forms developed because I believe these functions to be the best means yet 
devised for the representation of the sequential changes which living organisms 
regularly manifest as individuals or societies. It states, in a quantitative form, 
all that is qualitatively implied by the so-called “law of diminishing returns” as 
this is commonly invoked by economists. The special sense in which I have 
used the term law may be expressed as follows: 
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A statistical law is a mathematical generalization on the behavior of a system of 
observations such that the implications of the formula are in accord with the assump- 
tions basic to the phenomenon observed, and such that evaluations of the parameters 
of the formula determined from random samples are mutually consistent. 

A statistical law, then, posits a system of relations manifesting itself in the 
form of observations which must be subjected to analysis before the true nature 
of their interrelations can be inferred. It expresses a probable, rather than a 
certain, inference; but, within the limitations of its claim to precision, it leaves 
reason no more free to reject its specification of reality than does a law of 
mechanics. Indeed, the point is still in dispute as to whether any law of science 
‘an be more than a statement of probabilities. 

In contradistinction, the term empirical formula is properly restricted to cover 
the representation of the single set of observations at hand, and bears no neces- 
sary relation to any larger system. A sufficient test of an empirical formula is, 
therefore, the test of fit. 

We may fit an indefinite number of formulas to a population series and obtain 
satisfactory results so far as agreement is concerned; but, on extrapolating, the 
same formulas will yield results that are patently absurd. The backward 
extrapolation for the population of the United States shown in Table V(b) 
represents the known facts as closely as could be expected when we take into 
consideration that census enumerations include aboriginal and immigrant 
populations as well as native born. Certainly, no random empirical formula, 
selected on the ground of goodness of fit, could be expected to yield as satis- 
factory a result. 

Logistic theory does not, then, profess to guarantee infallibility of prediction. 
A population is not a mere aggregate of unrelated individuals inhabiting a 
restricted area, but a unified organization which grows by the utilization of 
total resources. When the supply of resources is profoundly disturbed or the 
basis of organizational unity destroyed, then the basis of prediction also is 
destroyed. And such reasoning is by no means peculiar to the sphere of social 
organization; for the integrity of any purely mechanical system is likewise 
conditioned by the assumption that the basis of coherence persists. 

At this point, those in whom the speculative disposition is strong may query: 
if statistical prediction does not yield a certain result, is it, in the final analysis, 
superior to the ready and far less expensive method of guessing? 

In answer, I can only say that, a posteriori, we can always, among a sufficiently 
large batch of guessers, find someone who has guessed well; but how, a priori, 
are we to know the good guesser from the poor? A population series consists 
of definite magnitudes, and any prediction of its development must result in 
the selection, out of a vast array of possible magnitudes, that which is most 
consistent with all the known facts. The gambler may elect to hazard his 
stake on the result of a random estimate; but the prudent will give heed to the 
exacting, if laborious, procedure of mathematical analysis. 
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ADDENDUM 


Another solution of the theoretical problem stated in Section I may here be 
noted. 


Given, as before, the function y = f(z, a, b --- ), we may, by assigning three 
approximate values to each parameter, compute 3” sets of values for the function 
y, thus: 

yu = f(x, ab --- ); yr2 = f(x, dibs --- ); ys = (2, dibs --- ); ete. 


From the observations Y, we may compute 3” sets of the residuals y — Y; 


’ 


and from these several sets of residuals, the corresponding standard errors of 
estimate, ¢, may be computed for each set of values of the function y; thus, we 
have: 


01 = o(Y, zw, ab;) 
012 = o(Y, x, aybe) 
o13 = $(Y, 2, aybs) 


Restricting the parameters to a, b, and holding a constant, we observe that 
the values o{,, 032,013 must vary with the assigned values of the parameter b, 
and take a minimum value when 6 takes its true or most probable value. As the 
errors in the approximation to b increase positively and negatively without limit, 
the computed values of o? will tend toward the infinite. They may, therefore, 
be assumed to lie on the arc of a parabola whose equation is a quadratic function 
of xa,b; hence, we may form the following equations of representation: 


O11 = ky + Ina, + mya} . 
oie = ky + lea + mya}. 
oi3 = kis + lisa + mya}. 
By addition, we have, 
Of tote tos = ku t+ ie + hag + (la + bie + s)an + (mn + mig + mig). 


By appropriate variations in subscript, similar equations may be written in 
a2 and a3, thus: 


O31 tobe +033 = kor + hoo + hos + (lor + Leo + o3)a2 + (mor + mee + M2303 « 
o3 it O32 + at = ks, + kse + hss + (Is: + Use + Us3)as + (msi + mz + M33) 03 ‘ 


These three equations are all of the quadratic form, and may be conveniently 
written as follows: 


A, = K, ote Lia, oh M,a; ° 
Ao = Ky + L a2 oe M,a3 : 
A3 = K, +b La3 + M,a}. 
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By precisely similar reasoning, the following equations in b may be developed: 
By = Kz + Leb; + Mob}. 
B. = Kz + Lebo + Mob}. 
B; = Kz + Lab; + M2bi, 


where 


2 2  _ 2 2 Bon 2 2 2 
By = 03, + 631 +631; Bs = o12 + G22 + 0323 Bs = 613 + 623 + 033. 


Since the values of a;, a2, a3 and by, bz, b; are assigned, the two sets of equations 
may each be simultaneously solved to obtain values for K,, L;, Mand Ke, Le, Me. 
To obtain the conditions for A = a minimum, B = a minimum, we differentiate 
with respect to a and b, as follows: 


DJA) = Ii + 2M,a; D,(B) = Lz + 2Mob. 


Setting these two equations equal to zero and solving, we obtain the adjusted 
values of a and b, thus: 


az —1,:2M;; b= —12:2Me. 


The extension of this method to the case of p parameters is obvious. Assign- 
ing three approximations to each parameter, we hold constant a value of one 
parameter (say a;), we form all possible combinations of subscripts for the 
remaining parameters (bjbeb; with ci¢2c3 with ete.). This will yield 3?-' values 
of o*, each of which is associated with a,. Repeating this process, we can form 
similar sets of values of o? by association with az and a3. We can then form the 
sums A; = o(Y2ajbe --- ); Az = o(Yxagbe --- ); Az = o(Yaxasbc--- ). In all, 
3 X 3”-! or 3? distinct determinations of o? will be required. In like manner, 
the equations for B,, Bz, B; and Ci, C2, C3, etc. are formed. The solutions for the 
adjusted values a, b, c, - - - follow directly. 

Since the method of solution given in Part I requires the computation of but 
2” values of o*, it is evident that the method of this section is the more onerous 
when considering the determination of a single set of adjusted values of param- 
eters, the excess being of the order 37:2? = (1.5)?. However, being more pre- 
cise, the present method will require fewer approximations to arrive at satisfac- 
tory values of the parameters sought. In other words, the mathematical 
advantage of economy lies with the theta technique; while the advantage of 
precision lies with the quadratic technique. 
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ON A METHOD FOR EVALUATING THE MOMENTS OF A 
BERNOULLI DISTRIBUTION’ 


By Everett H. Larcuier, §.J. 


1. The moments (per unit frequency) of a frequency distribution have long 
been regarded as useful characteristics of the distribution. If we denote the 
moment about the arithmetic mean by yu, we have for the Bernoulli distribution 


ww = > @f(), 


z=0 
where ¢ = x — npand f(x) = (") pear 


To evaluate the s-th moment about the arithmetic mean has always been a 


laborious task. Karl Pearson? gave the s-th moment about the arithmetic 
mean as, 


ds 
(1) m= |X tae + veer |, 


x 


which he said at that time was perhaps the easiest expression for obtaining these 
moment coefficients by successive differentiation. Romanovsky,’ however, 
was able to develop the recursion formula, 





dus 
2 11 = Ss - |, 
(2) Howl pa nsw i+ de) 
for the moments about the mean. Another relation for these moments is 
s—1 8 
(3) Ms = Z. @) [npqu; — Pu; 41). 
1=0 


Recently Kirkham‘ gave the expressions for the first eight moments which, 
however, are not in a form well adapted for numerical calculation on a machine. 





1 Presented to the American Mathematical Society, January 2, 1936. 

2 Karl Pearson, Biometrika, vol. 12 (1918-1919), footnote, p. 270. This expression is 
obtained from the moment-generating function. Obviously this method is exceedingly 
impractical for numerical calculations. 

3 V. Romanovsky, ‘‘Note on the moments of the binomial (p + q)" about its mean,” 
Biometrika, vol. 15 (1923). Recently this expression was given a simple proof by A. T. 
Craig (Bull. Amer. Math. Soc., vol. 40, pp. 262-264) and extended to the Poisson case. 

4W. J. Kirkham, ‘‘Moments about the arithmetic mean of a binomial frequency distri- 
bution,’’ Annals of Mathematical Statistics, vol. VI, pp. 96-101. 
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2. It is the purpose of this paper to express the s-th moment about the 
arithmetic mean in the form 


(4) ue = > Fi dnd’, 


¢=1 


where F’,,,,(n) are determinable functions of n dependent on s and t. We note 
here that p and q are the probabilities of the success and failure of an event in 
a single trial. 

Since we know that ue = npq and uw = 0, it is evident that the part of (2) 
enclosed in [ ] will be of degree 2 less than s + 1 in p and hence (4) will satisfy 
as a representation of the moment. 


3. To obtain a recursion formula for the functions F,,,,(n) we differentiate (4) 
with respect to p. This gives 


dus win : t—1 
dp — & tF’...(n)p . 


=] 





By (2) we may then write 


s+l1 s—1 8 
D> Fesic(n)pt = p(l — p)ns D> Foa,(n)pt + p( — p) >> tP,(n) p> 
t=1 t=1 t=1 


8 s+1 
ns >> Fy_1,1-1(n) pt — ns >> Fo-1,1-2(n)p* 
t=2 t=3 


s+l 


+ De dnypt — Y(t — Fran 


=2 


Since this is an identity in p, we have immediately the following recursion: 
formula for determining F,,;(n): 


F..(n) = n(s — 1)Fs-2,1-1(n) — n(s — 1) F ._2,1-2(n) + tF’.-1,4(n) 


(5) 
— (t — IF sean) 
in which 
{é > 83 
(6) Foo(n) = 1; and F,.(n) = 0 for Y <l1,s>0; 
t=1,s=1. 


These definitions arise from the known values of the moments and the condi- 
tions imposed by the identity in p. 

By means of (5) and (6) we are able to obtain very readily the values for 
F,,.(n) which are given in Table 1. 
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TABLE I 
Values of F..:(n) 


F's,3(n) 








~~ 
— 


0 0 
—n 0 
—3n 2n 
—7n + 3n? 12n — 6n? —6n + 3n? 
—15n + 10n? 50n — 40n? —60n + 50n? 
—3ln + 25n? 180n — 180n? + 15n3 —390n + 415n? — 45n3 
—63n + 56n? 602n — 686n? + 105n? |—2100n + 2590n?—525n! 
—127n + 119n? | 1982n — 2394n? + 490n? |—10206n + 13895n? 
—3850n* + 105n4 


sssaseasess 














cooooo 


24n — 20n? 
360n — 390n? + 45n3 —120n + 130n? — 15n3 
3360n — 4270n? + 945n3 —2520n + 3234n? — 735n3 
25200n — 35700n? + 10990n? — 420n‘ | —31920n + 46004n? — 14770n? + 630n‘ 


OnNOoorwhnd- 


cooooooo 


720n — 924n? + 210n3 
20160n — 29232n? + 9520n? — 420n4* | —5040n + 7308n? — 2380n? + 105n‘ 


1 
2 
3 
4 
5 
6 
7 
8 








With this table it is a relatively easy task to evaluate the first eight moments 
with the aid of a calculating machine. 


4. As an illustration of the preceding we propose to evaluate the first eight 
moments about the arithmetic mean for the binomial, (.06785 + .93215)%". 
We first evaluate the coefficients F,,,(n). 
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TABLE If 
Values of F 





|| Pea(378) | Fe,2(378) | Fs,a(3878) 


0 | 0 | 0 
378 —378 | 0 | 
378 | 1,134 | 756 | 
378 | 426,006 | —852,768 | 426 384 | 
378 1,423,170 | —5,696,460 | 7,121,520 | —2,848, 608 
378 | 3,560,382 | 784,501,200 | —2,371,307,400 | 2,374,868, 160 
378 | 7,977,690 | 5,573,275,090 | —27,986,054 ,000 50,430, 749,000 
378 | 16,955,190 |26, 123,640,500 |1,937,705,370,000 |—7,986, 171,610,000 


orWNe || « 


Im 








oo 








an 


Fs,6(378) F,7(378) F,,s(378) 





| 
| 
| 


0 
0 | 
—791,622,720 | 
—39, 236,327,400 | 11, 210,379,300 | 
12,070,808,800,000 | —8,064,644,270,000 | 2,016,161,070,000 














Then running off the powers of p, we have: 


= .067 85 p? = .000 001 437 968 13 
.004 603 622 5 p’ = .000 000 097 566 137 6 
® = .000 312 355 787 p’ = .000 000 006 619 862 44 
= .000 021 193 340 1 p® = .000 000 000 449 157 667 


Applying (4) we have 


5 In this table, as well as in the one that follows, all values are correct to nine signifi- 
cant figures. 
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TABLE III 
Values of pt F;, (378) 


s < 4 


— | — 


1 (378) 


2(378) 
3(378) 
4(378) 
pF ..5(378) 
p® F .,6(378) 
7(378) 


pF. :(378) 


p?F .,2(378) 
pF ., 3(378) 
p'F .,4(378) 
pF s.5 (378) 
p® F ,,6(378) 
p’ F’.,7(378) 
p> F ;,3(378) 


Ms 


. 7401693 


| 23.9071307 | 


6 


25.647 
16390 . 655 
245043 .490 
— 50255 .924 
3414 .985 

77. 

0. 

0. 


214541. 





| 1253242.57 


25.6473 
1961 . 17087 
— 266 .36702 
9.03650 

0. 

0. 

0. 

0. 


5. 2205079 
. 2361410 


20 . 6629331 


a 
‘ 


25.65 
36726. 
1740844 .7 
— 593117 .¢ 

teold . 
—3828. 


8159870. 


0. 


1729 .48765 


41066448 . ¢ 
— 11483860. ¢ 
1177702.‘ 
74. — 53386 .8 


38945760 .8 





25.6473 
| 6551.73743 
| —1779.32225 
150 .92880 
—4.09621 

0. 

0. 

0. 


4944 .89507 


This gives us the desired moments about the arithmetic mean of the binomial 


(06785 + .93215)?78, 


tO ps. 


Saint Louis UNIVERsITY, 
Saint Louis, Missourt. 


These values may be rapidly checked by applying (3) 














A METHOD OF DETERMINING THE REGRESSION CURVE WHEN 
THE MARGINAL DISTRIBUTION IS OF THE NORMAL 
LOGARITHMIC TYPE 


By Cari-ERIK QUENSEL 
Assistant at the Statistical Institute of the University of Lund, Sweden 


In a paper! in this Journal Professor 8. D. Wicksell gave the general outlines 
of a new method of calculating the regression lines. This problem was later 
on treated in detail by Dr. Walter Andersson.2, His method was to develop 
the formulas for the regression lines into a series of orthogonal polynomials 
under the assumption that the marginal distribution of the independent 
variate belonged to certain mathematically defined distributions, and to de- 
termine the constants with the aid of the method of the least squares. 

Among other cases he treated also the case where the marginal distribution 
was of the normal logarithmic type: 


D _1 flog (z—a)—1 7}? 
(1) F(z) = Bey +L ol y 
a1 V/ 2x (x — a) 


But as his method is entirely different from the method I shall give here, I 
will not go any further into the method used by Dr. Andersson. 

When the correlation surface F(x, y) of the variates x and y is given and then 
of course also the marginal distribution of x, F(x), it is known that the mean 
yz of the dependent variate y in an infinitely small array with the value of z 
between x and x + dz is given as a function of the independent variate x by 
the following formula (2) 


/ yF (x, y) dy 


/ F(x, y) dy 


In this formula the integrals are to be extended over the whole domain of 
the variation of y. 


(2) Y¥z = 


If now we make any transformation of x by introducing a new variate u, 
related to x by the formula u = (x), where we must suppose that wu is a one- 
valued function of x and contrary, the distribution f(u, y) of the variate u and 
y is given by the relation 


S(u, y) du dy = F(z, y) dx dy 





18. D. Wicksell. Remarks on Regression. Annals of Mathematical Statistics, 1930. 
2 Walter Andersson. Researches into the theory of Regression. Meddel»ade fran 
Lunds Astronomiska Observatorium. Ser II. N:r 64. 
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Writing the formula (2) in the following form: 


| yF (x, y) dx dy 
Ye = ————_— _; 


| F(a, y) dx dy 


we see at once that the mean y, can be given as the following function of wu: 
| vfuu dy 

es 
| flu, y) dy 


This relation, of course, is self-evident. The mean of the dependent variate 
in an array of the independent variate will be unchanged, when we change the 
variate x for another variate u, related to x by a one-valued function. 

The problem of finding the regression line of the mean y, can in such a way 
be much simplified, if it is possible to make a favorable transformation of the 
independent variate x. 

As shown by Professor Wicksell* we may, under certain conditions concerning 
the marginal distribution f(u), write the expression of the regression line in 
the following form: 


(4) 


oe 


a n Ana S™(u) , 
(5) v= Dy (-0 nl fu)? 


where the \,,1 coefficients are the seminvariants of the distribution of u and y. 
The conditions which the function f(u) must satisfy are among others that 
the function and all its derivates are continuous in the domain of variation and 
that the function and its derivates disappear in the limits of that domain. 
These conditions are satisfied by the normal curve of error. 
In the case where the distribution of uw is normal, the derivates f‘”(u) take 
the following form: 


(6) fu) = (-1)"A, (uw) fw); 


where the expressions H,,(u) are the well known Hermitian polynomials. 
The formula (5) takes the following simple form. 


2 


(7) Ye= >) Mt Halu) 


n 
0 





If we can change the given marginal distribution F(x) by a favorable substi- 
tution u = ¥(x) into a normal curve, and if, this substitution made, we can 


3§. D. Wicksell. Analytical Theory of Regression. Meddelande frin Lunds Astrono- 
miska Observatorium. Ser II. N:r 69. 
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calculate the coefficients \,,, from the moments or other known characteristics 
of the given correlation distribution, F(a, y), it is possible to express the regres- 
sion line as the formula (8) shows: 


oo 


a An,1 
(8) Ye = >) 27 HalV)) 
0 
It must be observed that the polynomials H,[y(x)] are orthogonal with regard 


to the distribution F(x) of the independent variate x. We have 
| A, {W(2)| A; [W(a2) | F(z) dx = | aca nosew du = 0 tAj 


Not in all cases it will perhaps be possible to calculate the \,1 coefficients, 
when we have transformed the marginal distribution into the normal curve, 
but in one case it is rather simple to calculate these coefficients from the mo- 
ments given. 

The case alluded to is the one, where the variate u is given from x by the 
relation u = log(« — a), that is that the marginal distribution is of the so 
called normal logarithmic type (1). 

In that case it is possible to calculate the \,1 coefficients from the marginal 
moments V,,o and from the correlation moments of the type V,,.:. 

We suppose that the marginal distribution is of the logarithmic type and 
that from the moments of the x distribution we have determined the three 
constants a, o, and Jin the usual manner.‘ 

Then we calculate from the given correlation distribution the moments Vis 
about the point x = a and the correlation moments V,, , about the point x = a 
and y = m, (the mean value of the y-variate). 

From these moments it is possible to calculate the \,.1. coefficients in the 
following way. 

The characteristic function of u and y is given by the following relation: 














sw AKL 


AkL yk yt ; 
(9) U(tte) =e *! as I ety fu, y) dudy 


where the integrals are extended over the whole domain of variation. 

If the distribution of wu is according to the normal law, we have A;,o = 0 for 
k = 3, but in the calculations here it is not at all necessary to suppose anything 
about these higher seminvariants. On the other side, the correlation distri- 
bution f(u, y) is obtained from the characteristic function by the inversion 
theorem. 


1 [2 (?% SA Gevtiwn 
(10) f(u, y) = (Qn)? oo e wiry dy, dwe 
TT)” J—0o J—2 


4 How these are to be determined is shown in Pae- Tsi Yuan. On the logarithmic 
Frequency distribution and the Semi-Logarithmic Correlation Surface. Annals of Mathe- 
matical Statistics, 1933. 
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But we can also get the following relation 


1 BM chewy)? 
(11) e“ f(u, y) du = - [ ome shila dwe 
Tv 


Of this last expression (11) between the characteristic function and the 
distribution function I will make use in the following. 

The moments Vi of the distribution F(z, y) about the point x = a, y = m, 
are given by the formula 


(12) vis= | [ © ay — mR, 9) dr dy, 


If we write y instead of y — m, and instead of x — a we write e (b = log e) 
the expression (12) takes the following form: 


(13) Viz= I/ eu yi f(u, y) du dy 


For the marginal moments of x about the point x = a we get 


(14) —_— / ” (¢ — 6) P(c) de = | * enbu f(u) du 


oo 


Comparing this formula (14) with the expression for the characteristic function 
of the distribution f(u) 


(15) U(t) = [ e'“ f(u) du = e 


00 


we find the following simple relation 


> 
~ 


(16) tion 


S(nb)k 


For the moments of the type Vin we get 


(17) wi = // erouyf(u, y)dudy = [ viv [ ema, y) du, 


If we compare the last integral in the formula (17) [emt y)du with the 


formula (11) we see that we can write (17) as follows: 


it oe > AH (n b)F( 4 we) 
(18) V ~~ ne = ydy € dwe 


From the sum 2, aon (nb)*(twe)' we may take out the part > ae r (nb), 


Ell 
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where / is zero and which therefore does not contain any dignity of we, and 
write the remainder in the following form: 


Ee M r (ewe)! 





where we have 






det Nai 


Ai = Ay nb + 37 


7 (nb)? + sy 7 (nb)? -- 













he re, Brn ™ 
a O + a nb + —= y (nb)? - 
> 


e 1 i — (iwe l e ° e 
The integral om | gw f ee may be considered as a frequency distribu- 
Tv 


tion y (y) with the seminvariants \}. 
The formula (18) will thus be written 


(19) 





and as 
= _ 3 
| vavet) =» =A, = Aynb + or y (nb)? + 3 y (nb)? - 
we get 
(20) Via = Viaords 
or 
(21) = = dnb + 32 7 (n b)? + 3 _ 7 (nb)? - 


n,0 


We see that in the formulas for “ we have all the seminvariants X,,1 in- 
volved. A successive determination of the seminvariants A, with the aid 
of the moments of the same and lower degree is therefore not possible. 
However, when we use the formula (8) for the regression, we must suppose 
that the seminvariants \,,1 with growing n converge rather soon towards zero. 





If the successive differences A” (5) of the quotients Vina are calcuiated, it 
n,0 


n,0 
may be possible to judge, how far it is possible to go with success. These 
differences will in most cases diminish rather soon and we shall therefore in 
most cases get a value of m about which we can suppose that the differences of 
higher order than this will all be so small that they can be neglected and as a 
consequence of this fact all higher seminvariants can be neglected too. 
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When this value of n has been determined, the n first seminvariants will 

: ba Wi 
all be obtained from the n first quotients = 
n,0 


Thus we finally get the regression line as follows: 


ye =m, + >> M2 H,[log (x — a) — J 
—/ i! 


or in standardized units: 


a = Ava log (x — a) — 1 
Yz = m, + » Yay, 982 —2) =I 


° i 
1 ilo} 





THE STANDARD ERROR OF A “SOCIAL FORCE” 
By Stuart C. Dopp 
I. Definitions 


In the theory of measurement of social forces certain special cases of frequent 
occurrence where the population shifts from one date of measurement to the 
next require the derivation of appropriate standard error formulae. 

The theory may be briefly restated! in equations as follows: any measurable 
social change, C, in a population, P, may be defined as the difference in mean 
scores, S, from surveys or measurements on the dates denoted by subscripts 


DSe ZS 
— — —— 1 
Pp - >? (1) 
The momentum of a social change may be defined as the product of its time 
rate in years and the population that is being changed 


Me = PVo1 (2) 


weet a Se (2a) 


Yo4 Yo4 


Co = Se = Si — 


where Y>2_; is the period from date 1 to date 2 and V is the velocity, or speed 
of change, in that period. The acceleration of a social change is definable as 
the rate of change of the velocity of change 


a Vas — Vor 


OY (4-3-2412) 


A (3) 
where each velocity, being an average for its period, is taken as representing 
the mid-date of that period. 

The resultant social force which produces a measured change is now definable 
as that which accelerates the change in a population. It is measurable as the 
product of the acceleration and the population. 


F = AP (4) 


P , Si Se S3 Ss ) 
: - -=s : 5 
OY (43-241) ee You Y4-3 * Y4-3 (5) 





1A Controlled Experiment on Rural Hygiene in Syria, Dodd, 8. C., Publications of the 
American University of Beirut, Syria, Social Science Series No. 7, 1934, pp. 336. 

Also, A Theory for the Measurement of Some Social Forces, Dodd, 8. C., Scientific 
Monthly, Vol. XLIII, No. 1, July 1936, pp. 58-62 

2 Force thus defined in terms of its effect is a resultant force, i.e., the residual force after 
deducting all resisting forces from the total force in the direction of the change observed. 
This formula defines quantitatively and exactly the ‘‘net’’ force not the ‘‘gross’’ force 
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II. The Sampling error of one case (momentum) 


The formulae for the standard errors of sampling for the above concepts, 
social change, velocity, momentum, acceleration and force, (C, V, M, A, and F) 
have been published for the case where the population, P, is the same on all 
dates of measurement. But it is not always possible to observe the ideal ex- 
perimental technic of holding the population unchanged in number nor to select 
out individuals common to all the surveys and to neglect the rest. Ordinarily 
there will be different P’s, Pi, P2, P3, and P,, at the different dates. 

To derive the standard errors of (2) and (4) when P shifts, each P is con- 
sidered to be a sub-sample® of the main sample which is (P; + Pe + P3 + P,). 
The orthodox view of sampling is taken where the sub-samples may differ in 
size but maintain fixed proportions in each main sample which is drawn from 
the “parent” population. 

Let primes denote an M, or other function of (1) to (5), which is an approxi- 
mation due to the shifting of the population and the use of an average P. 

To simplify and generalize the notation, let k denote the constant term com- 
pounded of P’s and Y’s which is associated with each S. The first subscript 
of k denotes the function, f, which is any particular one of the left hand members 
of equations (1) to (5) and the second subscript denotes the date of its S. Thus, 
from (2a) 


—P P. 
km — a — ke (6) 


Then (2) may be rewritten: 


Ms-1 = S, ka + Se ke 
2 
>, Skw. 
1 
To derive the standard error of (7) the total] differential is: 
’ 281 ZSe 

d  _ =K a k; l ae 8 
M.-, kan d (28) + m2 ¢ (=) (8) 
If Qi2 denotes the population common to both dates of measurement so that: 


P, = Qe + Q 


(9) 
P» Qis + Q2 


producing the change. It thus measures only the observable part of the total forces in the 
situation. The fundamental problem remains, as always in science, to observe more 
adequately, to devise experimental and statistical technics for measuring the different 
forces (in isolation and in combinations) which facilitate or resist the measured change. 

3 The author is indebted to Mr. S. 8. Wilks (Princeton) for this method of deriving these 
standard errors in a fluctuating population. 
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and, since the differential of a sum is the sum of the differentials of the several 
terms, (8) becomes 

. kun Qi2 Q1 k ne Qn Q2 
dM 2-1 = > ds: + >> ds, + Dd, dss + SS dse (10) 
1 1 1 


1 1 


Squaring gives 


(@M 3-1)? = “= i ds,)? oo (Ed? 


(11) 
2 kan ke Qi Qi2 Q2 Q1 Qi2 Qe 
+ —_— “> P. P te Yds + Yds Vda + Yass Yds + Yds: Das 
1 2 1 1 1 1 1 


1 


On summing and dividing by the number of cases to get the expected values, 
the last three terms in the square brackets vanish. Using the relation where, 
in random sampling, the correlation between two variables is the same as the 


correlation between their means 
ys Se 
S ( 8] _< So 
= SiSe. is Qie Qis ; 


a >= 444, > = 
— 


Qis 7102 
V Qu - Qe 





2 2 
a a} kyo o 


2 kun kame Qi2 01 02 2 
nA, = oa P; + P, + — PP | 


Standard error of momentum when the population shifts 


(13) 


The best estimates of o; and oe are the standard deviations of the scores, s; 
and s2,and the best estimate of 7;21is, strictly, the covariance of the common cases 
divided by the two sigmas. Unless the selection of Qi2 out of P; and Pe cur- 
tails the range in some way (i.e., Qi2 is not a random selection), then, except for 
sampling variation, o; and o2 are the same in the Qj2 population as in the P; and 
P, populations so that there is only a sampling discrepancy between the ratio 
above and the ry, the observed correlation between the s; and s2 scores in 
the Qi. population. 


III. The generalized standard error 


The above standard error may be readily generalized. Any of the equations 
(1) to (5) may be expressed as a simple linear sum of the products of a variable, 
S, and its appropriate constant, k. 


1=n 


» Si kyi (14) 


i=1 
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where f is any one of the concepts S, C, V, A, M or F defined by (1) to (5) 
and n is the number of surveys, or different S’s involved, and 7 denotes each 
survey in turn from 1 to n. Thus where f means F, (5) becomes: 


/ 


Je =—’= kr: Si + kro Se + kr3 S3 + krs Ss 
i=4 (15) 
= } a kr Si 
i=1 
where 


P, + P2 + Ps + Ps . 
Oe ee ee 16: 
- ” 2 Y ca-s-e41) Ye-y (16a) 


ee oe es (16b) 
2 Y (4—3—2+1) y (4—3) 


In the special case when a force, F, has been determined from only three 
surveys using two consecutive periods, n = 3 and 


p - Pit Prt Ps 
1F 





— ae ae 16 
1.5 Y (s—1) (Y 2-1 , ” 
(P, + P2 + Ps) (Ye + Yoo) 

rte! + Ps) (Youn ! 16d 

” 1.5 y (3—1) y (3—2) Y (2—1) ' 
P, + P, + P; 

tH 16e 

- 1.5 Yo» Ys — 


If the difference between two forces (or other functions, f) has been measured 
in either the same or in different populations and the significance of the differ- 


ence in terms of its standard error is desired, f of (14) can also denote that 
difference. 


far = Fa — Fo; fam = Ma. — M); ete. (17) 


It is only necessary to write the difference as a linear sum of products of S 
and k on the model of (2a) or (5) to get the k-values for that particular f. 

It is now possible to write the standard error formula for f in a single gen- 
eralized form that covers all the concepts and their differences as defined in 
equations (1) to (5), (14) and (17). Observing that (14) is the general case for n 
surveys of the particular case (7a) where n = 2, it becomes evident, that on 
taking differentials, squaring, summing, and dividing the linear sum of the n 
terms of (14) there results n* terms of which there are n that are variances 
n> — 





‘ ke? n . : 
(times constants) of the sort > and are different terms each occurring 


kkQoor 


twice that are covariances (times constants) of the sort PP 


From these 
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rough considerations as well as from rigorous derivation, the generalized standard 
error of (14) is found to be: 


2 ~~. keys 03 keyg; 05 Qi Vij 
a a 


The generalized standard error. 


Where 7 and j denote each of the n surveys in turn. There will thus be 7? 
terms to be summed—the number of combinations of 7 with 7 including the 
vases where 7 = j. 

The derivation of (18) as well as its computation from data and its inter- 
pretation in special cases can all be made clearer by arranging the terms in a 
square array as follows: 





i— | n 








Coefficients | ft Cpe Kin On 
| is 


On Tin 
( ) 


Qe nT2n 
( ) 








QinTin | Qen Tan 








( ) 











PB ic. »d 


ko 
P 
and of columns and write each computed Qr value in its appropriate cell, noting 
that in the main diagonal cells the self-correlations are unities and the popu- 
lation common to both column and row surveys, Q;; is the entire population 
of that survey as Q;; = P; whenz = j7. Thus Q; = P,. Next in each cell’s 
parenthesis enter the product of three factors, namely: a) the cell Qr term, 
b) the column coefficient, and ¢) the row coefficient. The sum of these products 
in the parentheses, n? in number, is o of (18). 

From the above square array it becomes clear that whenever in (17) the 
difference of two. observed forces, or other functions, is derived from different 
populations the Q between these populations is zero so that the entire product 
terms in those cells vanish. Thus in the very simplest and familiar case of 


To get o; write the computed values of the coefficients — as captions of rows 
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comparing two means from different populations, n = 2, Qe = 0, k = 1, and 
(18) reduces to the usual sum of the two variances of the two means 

2 2 
1 2 


i PN 
Pi P» 


o difference in means = (19) 


IV. Some special cases 


It should be observed that the above formulae for the standard errors when 
P shifts all become identical with the simpler formulae previously derived for 
the case of a constant P. In this case, every Q,, = Pp = P, and in the square 
array (in addition to k’s which no longer involve an average P), the Q or P of 
the cells and the P’s in the row coefficients, may be omitted as they cancel each 
other out. 

Another special but very frequent case is where the social change is not 
given in terms of a difference in means, S; and S2, but in terms of a difference 
in percentages, as when a literacy rate rises from 30% to 40%. A percentage 
can be viewed as a mean of a two-category, all-or-none, present-or-absent 
variable such as: A, non-A (foreign or native born, literate or illiterate, etc), 
where A is assigned a value of 1 and non-A a value of 0. Then the sum of the 
values of A, each times its frequency. divided by the population is both a pro- 
portion and a mean. Its standard error in the percentage, p, form of expres- 
sion is then equal to it in the mean form: 


_ PV100—p__ Os 





On , = = 
/P : P 
A (20) 
(where s = lor0 and p= = = s) 


so that where S; in (14) is a percent p(1.00 — p) should be substituted for o; 
(and o;) in (18). In this case the appropriate formula to use for getting r;; 
in (18) depends on the nature of the distribution of the variable that is expressed 
in percentage form. If the distribution is normal, tetrachoric r may be ap- 
propriate, while if the S in percentage form is from a two point distribution, 
r from a four fold point surface may be appropriate. 

In all the above cases the usual interpretation of the significance of f in respect 
to sampling errors may be used in entering a normal probability table with a 
given o,; from (18) and reading the probability of such af occurring by chance.* 

For a numerical illustration of this formula (18), consider the case of two 
villages, the statistical significance of whose momentums of a social change are 
to be determined. The data are from a study! of Syrian villages where an 


* Mr. Wilks comments here that, ‘“‘there is a more exact and rigorous test for comparing 
the two sets of S’s which enter into a pair of M’s or F’s which involves some recent statisti- 
cal theory but it is doubtful if the extra refinement is worth while at this stage of soci- 
ometric development.” 
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itinerant Health Clinic in two years changed the average hygienic status of the 
families in each village by amounts of score (on a scale of 1 to 1000 points, 
devised for this study) as indicated in the table below. 










































































Village A Village B 
Mean score in 1931 = S; = 253 321 
” «6 1933 = S. = 304 528 
Population (families) in 1931 = P; = 46 46 
- ” “ 1933 = P, = 40 32 
Standard deviation of scores in 1931 = o; = 54 39 
“cc ““ “ “cc “cc 1933 = 62 = 58 70 
Families common to both censuses = Qi. = 40 32 
Correlation of scores from the 2 dates = ry = .00 .19 
kan = —(P, a P2)/2Y (2-1) = —21.5 —19.5 
kuz = —ky = 21.5 19.5 
k.noi/P, = — 25.24 — 16.53 
kueo2/Pe = 31 ° i 42.65 
Queries = 0 6.08 
ou, = 261 249* 
Momentum = M3_, 1,097 4,037 
Significance ratio M3_,/ou!, 4.2 16.2 
* The calculation of this ¢ by (18) may be illustrated in detail: 
Village B 




































































7” | 1 2 
Coefficients, i | 
| —16.53 | 42.65 
| | =( ) = 62,207 
1 —16.53 | 46 (= P,) 6.08 (= Qr) it o2 
| (12,571) (—4,286) — 
2 42.65 || 6.08 (= Qr) 32 (= Pr) Om’, = 249 
| (—4,286) (58,208) 















The momentum of the movement towards improved hygiene achieved in 
village A is 4.2 times its standard error, while that of village B is 16.2 times 
its standard error. The excess momentum of village A over village B is 


4 ‘ i i . , 
8.1 (= at) times the standard error of their difference in momenta. Since 
all three of these significance ratios are well over 3 the conclusion is that the 
observed momenta and difference of momenta are statisticaily significant and 
cannot reasonably be due to sampling fluctuations. It may be noted that the 


significance ratios for the amounts of this social change, the difference in mean 
scores, are in close agreement with the above figures, being 4.1 and 15.9 for 
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villages A and B respectively, instead of 4.2 and 16.2 as above. These discrep- 
ancies of a .1 and .3 in the statistical significance of these social changes com- 
pared with the corresponding social momenta are accounted for by the fact 
that the shift in the size of the population is allowed for in our formula for 
the case of momenta and is not considered in the usual formula for the case of 
social change. 

A minimum of three measurements of one population is necessary to deter- 
mine a social force. To determine its standard error all the correlations must 
be secured between every pair of measurements, each correlation derived from 
the part of the total population that is common to that pair of measurements. 
Obviously the data as currently reported from surveys and censuses and statisti- 
cal bureaus do not meet these specifications. More rigorous analysis of social 
data and reporting of correlations in it is a prerequisite to the measurement of 
social forces and their significance. 
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AN APPROXIMATION TO “STUDENT’S” DISTRIBUTION* 


By Wa.ter A. HENDRICKS 


I. Introduction 





























The function commonly known as “‘Student’s”’ distribution occupies a promi- 
nent position among the classic contributions to the field of statistics, not only 
for its intrinsic value but also for the stimulus which it gave to statistical re- 
search at the time of its discovery. 

The function, which may be written in the form, 






] 


1 d 22 SS 
Q) a ak 


(1 + 2)-tdz, 
gives the distribution of the ratio, z, of the estimated arithmetic mean, Z, to the 
estimated standard deviation, s, for samples of n observations drawn from the 
normal universe specified by the arithmetic mean, zero, and the standard 
deviation, «. This function, together with a table of values of its integral was 
given by “Student.” 1° 

In view of the fact that similar distributions were subsequently found by 
Fisher? to arise in a larger variety of practical problems than was originally 
supposed, a table of values of a new integral was later given by ‘Student’! 
in which the distribution of a variable, t, defined by the relation, 


(2) {= i — ie, 


rather than the distribution of z itself, was considered. Another table giving 
the distribution of ¢, in a form intended to be more convenient for use by re- 
search workers wishing to apply statistical methods to experimental data, was 
later given ‘by Fisher.* 

The integration of functions of the type defined by equation (1) involves 
considerable labor, a fact which has been somewhat embarrassing to practical 
statisticians interested in the distributions of z and ¢ for values of n larger than 
those included in the above-mentioned tables. The recent appearance of 
Tables of the Incomplete Beta-Function, prepared under the direction of 
Pearson,’ has considerably alleviated the difficulty, but the requirements of 
certain practical problems are not easily satisfied even with the aid of these 
tables. Consequently, simple approximations to the distributions of z and f, 








*A thesis submitted to the Faculty of the Columbian College of The George Wash- 
ington University in part satisfaction of the requirements for the degree of Master of 
Arts. 
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which will be sufficiently accurate for most practical purposes, should be of 
some interest. 

According to ‘‘Student,’’® the distribution of z tends to approach a normal 
curve with a standard deviation of (n — 3)~} for values of n greater than 10. 
However, Deming and Birge! have recently suggested that the distribution 
tends to approach a normal curve with a standard deviation of (n —14)-?. 

This thesis presents a simple approximation to the distribution of z, which 
can be readily extended to the distribution of ¢ and which will give more accu- 
rate results than either of the above approximations. 


II. Approximation to the Distribution of Z 


The approximation presented here is based upon the assumption that, for 
large values of n, the distribution of s tends to approach a normal curve with the 


* e . ° ° Co ° 
arithmetic mean, §, and the standard deviation, ——, that is, 
; j ; 


2in 


j _ = —" 
(3) dF, =e a" 
to 


ds. 


Since the distribution of the estimated arithmetic mean, 7, is known to be 


e . . o ° es e ° . 
normal, with the standard deviation, —, we have for the joint distribution of 
ni 


s and 2: 


N= ~ [}344+(s—3)? - 
(4) dF,;==—e” ' ds di. 
2:10? 


§ may be expressed in terms of n and o by the well-known relation, 
(5) 8S = C,0, 
in which the factor, c, , is defined by the formula, 


(6)  _ 2 rQn) 
"ATE — DY 
If we write, c,0, in place of §, in equation (4) and make the transformation, 
(7) £ = 82, 
we have for the joint distribution of s and z: 


nm 


; (352224 (s—e,o)? 
(8) nn ee nol 6 ds dz. 


To find the distribution of z, all that is necessary is to write: 


+ 
(9) dF. =k || e—(as-b)” 5 as| dz, 
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in which: 




























2 
n — 0" —— 
k=— 6 "+, 
2) ro 


(10) nt, ‘ 
a= 5 (22 + 2)}, 


b = 2nic, (22 + 2)-}. 





The integral in brackets in equation (9) can be evaluated without any dith- 
culty. We have: 


ed 1 
(11) | e-@s-b)? ods = ~ 


ee a* 


Substituting this value in equation (9) and replacing k, a, and b by the quan- 
tities which they represent, we obtain the following expression for the distribu- 
tion of z: 


tise, sciuaaie le 
(12) a? = ~~ € #+2 (22 4. 2)-3 dz. 


Tr? 






If we now define a new variable, u, by the relation, 


9 





(13) uz 2nc,” 242 


and make the appropriate substitutions in equation (12), we have, for the dis- 
tribution function of wu: 





1 2 
(14) dF, = ——e-™ du. 
2)! 
Equation (14) is obviously a normal curve with unit standard deviation. We 
have thus deduced the interesting fact that, for values of n sufficiently large so 
that the distribution of s may be represented by a normal curve, the quantity, 


2'n'c, is distributed as a normal deviate with unit standard deviation. 


(2 + 2)" 

The accuracy of this approximation as compared with that of the approxima- 
tion suggested by “‘Student’’® and that of the more recent approximation sug- 
gested by Deming and Birge! may now be considered. As previously stated, 
the “Student” approximation is based on the assumption that the quantity, 
(n — 3)%z, is distributed as a normal deviate with unit standard deviation for 
values of n greater than 10, while that suggested by Deming and Birge is based 
on the assumption that the quantity, (n — 1})%z, is so distributed. 

Table 1* gives values of the integral, J., defined by: 


: (1 + 27)-?*" dz, 


(15) [,= Bim — 0), 3] Bp 


* All tables and charts to which reference is made are to be found in the Appendix. 
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for the case, n = 10, together with the corresponding approximate values ob- 
tained by making use of the three approximations suggested by “Student,” 
Deming and Birge, and the present author, respectively. The exact values 
and those obtained by the ‘“‘Student”’ approximation were derived from values 
calculated by “‘Student’’® and given by Pearson.’ All other data in the table 
were calculated by the present author. 

An inspection of Table 1 shows that the values of I, based on the approxima- 
tion presented in this thesis agree very well with the corresponding exact values. 
The agreement is better than that found in the case of either of the other two 
approximations. The Deming and Birge approximation gives better results 
than the “Student” approximation for values of z in the neighborhood of zero, 
but for other values of z the opposite is true. 


III. Approximation to the Distribution of ¢ 


Since tables giving the distribution of the variable, t, have largely superseded 
those giving the distribution of z in practical statistical work, the feasibility of 


applying the above three approximations to the distribution of ¢ is worthy of 
consideration. 


The variable, t, has already been defined in terms of n and z by equation (2). 
If, in equation (12), we make the transformation, 


(16) z = (n — 1)%t, 
we have, for the distribution function of t: 


Mas xn _— 
(17) af, = 2ni(n 1)en 


t2 
e+2(n—1) [#2 4 2(n — 1)}° det. 


rr 
If we now define a variable, v, by the relation, 
2 
#2 + 2(n — 1)’ 


we have, for the distribution function of v: 


(18) v? = 2ne,2 


| 
(19) dF, = Di © ; dv. 


Equation (19) shows that, for values of n sufficiently large so that the distribu- 
tion of s may be represented by a normal curve, the quantity, 


2'nic, 


t 
[@ + 2(n — 1)}’ 
is distributed as a normal deviate with unit standard deviation. On the other 
hand, if we assume with “‘Student”’ that, for large values of n, the quantity, 
(n — 3)!z, is normally distributed about zero with unit standard deviation, we 
(n — 3)! 


should expect to find that the quantity, (n= 1 t, is also distributed as a normal 
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deviate with unit standard deviation. If the Deming and Birge approximation 
to the distribution of z is assumed to be valid, we should expect to find that the 
(n — 14)! 

—— t 
(n — 1)3 
deviation. 

To test the accuracy of each of these three approximations to the distribution 
of t, we may make use of the well-known table of values of ¢t given by Fisher.’ 
This table is so constructed that a value of ¢ corresponding to a given number of 
“degrees of freedom’ and a given value of “‘P’’ may be read from the table, 
where P is defined by the relation, 


| te lle . 
=~ eR aaa bey ” 


The entries in the last line of the table, corresponding to an infinite number of 


‘“‘degrees of freedom,” are the deviates of a normal curve with unit standard 
deviation. 


quantity, , is distributed as a normal deviate with unit standard 


To test the accuracy of the ‘“‘Student”’ approximation, we may calculate the 

entries for a line of this table, corresponding to n — 1 ‘“‘degrees of freedom,” by 
, 

multiplying the entries in the last line of the table by a Ni These approxi- 
mate values of t may then be compared with the exact values given in the table. 
The accuracy of the Deming and Birge approximation may be tested in the same 
manner, except that in this case the entries in the last line of the table should be 
(n — 1)3 
(n — 13) 
equation (19), we may calculate the values of t corresponding to n — 1 “degrees 
of freedom”’ by means of the relation, 


multiplied b To test the accuracy of the approximation given by 
J } £ y 


(21) (ae 


2ne,2 — v?’ 
in which the entries in the last line of the table are to be taken as the values of v. 

Table 2 gives the exact values of ¢ corresponding to the values of P given in 
Fisher’s table for n = 10, together with the approximate values calculated by 
means of each of the above three approximations. This comparison of the 
accuracies of the three approximations is equivalent to the comparisons pre- 
sented in Table 1. The conclusions which may be drawn are in agreement with 
those which have already been drawn from that table. 

In order to test the behavior of each of the approximations for a larger value 
of n, values of ¢ corresponding to the different values of P were calculated for 
n = 30. The results are presented in Table 3. The rank of each of the three 
approximations, with regard to accuracy, for n = 30 is the same as for n = 10. 
Although all three give more accurate results for the larger value of n, the 
superiority of the approximation presented in this thesis is quite apparent. 
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For extremely large values of n, all three approximations evidently tend to 
become one-hundred percent accurate, for the distribution of t tends to become 
normal as n is increased indefinitely. In the case of the ‘‘Student”’ and Deming 

1 1 
and Birge approximations, the ratios, == and - wala ; 
(n — 3)! (n — 13)! 
proach unity, respectively, as n becomes very large. The approximate value of 
t given by equation (21) also tends to approach the normal deviate, v, as n is 
increased for we have: 


obviously ap- 


lim 2 = lim 


2 9 
noo no 2nc,, =e no 


an — IP _ 4 [__anot__eP 
— 2nc,? — v* 2nc,? — v* 
(22) 


= lim 


N-7S 


IV. Discussion 


The greater accuracy of the approximation to the distribution of z presented 
in this thesis apparently can not be explained by the hypothesis that the distribu- 
tion of s becomes normal more rapidly than the distribution of z as n is increased. 
Table 4 presents values of the ordinates of the normal curve with unit standard 
deviation, together with the corresponding ordinates of the exact distributions 


for 


_ 2'n} . : 
of the quantities, — (s — 8), (n — 3)}z, (n — 14)3z, and 2'niec,, 
o 


(2 + 2)” 
ae elie 2'n} 
n= 10. Although the distribution of ie (s — 5) seems to follow the normal 
o 


curve more closely than does the distribution of (n — 3)}z, the opposite seems 
to be true in the case of the distribution of (n — 13)!z. The distribution of 


2)niecn _, however, follows the normal curve quite closely. 
2 


Zz 
(2 + 2) 
The behavior of these distributions for n = 10 can be observed more easily 
tal , 2'n} 
in Figures 1, 2, and 3 in which the frequency curves of —— (s — §), (n — 3)?z, 
o 
and (n — 13)!z are respectively plotted together with the normal curve with 


unit standard deviation. The frequency curve of 2'nic,, was not 


z 
(2 + 2)! 
plotted because of the fact that this curve follows the normal curve so closely 
that the two curves could not be distinguished when plotted on the scale used in 
the other three charts. 

The most reasonable conclusion which can be drawn from Table 4 and Figures 
1, 2, and 3 is that the departure of the exact distribution of s from the normal 
curve has very little effect in destroying the normality of the distribution 
of 2'nic,, 


era 
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V. Values of the Factor, c,, 


For the practical application of the approximations to the distributions of 
z and ¢ presented in this thesis, a table of values of the factor, c, , is required. 
Values of this factor, for values of n as high as 100, have been tabulated by 
Pearson’: ® and by Shewhart. For values of n greater than 100, c, may be 
calculated accurately to at least five significant figures by the following relation, 
given by Pearson‘ and by Deming and Birge!: 


ii i 
4n 32n?° 





(23) Cc, =1— 





Table 5 presents values of c, for some large values of n, calculated by the 
present author. For values of n not included in this table, c, may be calculated 
by means of equation (23) just as rapidly as by interpolation in the table. 





VI. Summary and Conclusions 


For values of n sufficiently large so that the distribution of s may be repre- 
sented by a normal curve, the quantities, 


zZ t 

@ pop nd i eae DP 

are distributed as normal deviates with unit standard deviation. The results 
obtained by assuming a normal distribution of s are more accurate than those 
obtained by assuming that either (n — 3)!z or (n — 14)!z is distributed as a 
normal deviate with unit standard deviation. For extremely large values of n, 
the distribution of each of the above quantities tends to approach a normal 
curve with a mean of zero and unit standard deviation. 


2'nic,, 
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VIII. Appendix 
TABLE 1 


Exact values of I, and approximate values, derived from tables of the normal 




















probability integral, for n = 10 
I, 
Runt wwe | sesceaiaetion | aggecliaation | sppeummanin 

—2.0 .0001 .0000 .0000 . 0004 
—1.8 .0002 .0000 .0000 .0006 
—1.6 . .0005 .0000 .0000 .0010 
—1.4 .0011 .0001 .0000 .0018 
—1.2 .0029 .0007 . 0002 .0038 
—1.0 .0075 .0041 .0018 .0086 
— .8 .0199 .0171 .0098 .0211 
— .6 .0527 .0562 .0401 .0535 
— .4 . 1304 . 1448 .1218 . 1307 
— .2 . 2816 . 2984 . 2799 . 2817 

.0 . 5000 . 5000 . 5000 . 5000 
+ .2 .7184 .7016 .7201 .7183 
+ .4 . 8696 .8552 . 8782 . 8693 
+ .6 .9473 . 9438 . 9599 .9465 
+ .8 .9801 . 9829 . 9902 . 9789 
+1.0 . 9925 . 9959 . 9982 .9914 
+1.2 .9971 . 9993 . 9998 . 9962 
+1.4 .9989 . 9999 1.0000 . 9982 
+1.6 .9995 1.0000 1.0000 . 9990 
+1.8 . 9998 1.0000 1.0000 . 9994 
+2.0 . 9999 1.0000 1.0000 . 9996 



























TABLE 2 
Exact values of t corresponding to different values of P and approximate values, 
derived from normal deviates, for n = 10 
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Exact value 






2.921 


TABLE 3 
Exact values of t corresponding to different values of P and approximate values, 
derived from normal deviates, for n = 30 


“Student’’ 
approximation 








.90 
.80 
.70 
.60 
.50 
.40 
.30 
.20 
10 
.05 
.02 
.O1 















































127 


. 256 
.389 
. 930 
.683 
. 854 


Ime nneee 


.055 


311 


.699 
.045 
.462 


.756 


. 130 
. 263 
399 
043 
.699 
872 

1.074 

1.328 

1.705 

2.031 

2.411 

2.670 


t 
r | aia auiaie | teen} ae | a 
| Exact value | jain [approximation | “nen 
90 | 129 142 | 129 | 129 
.80 | 261 287 | 261 | .261 
70 | 398 437 396 398 
60 | 543 595 540 | 544 
50 | .703 765 694 | 703 
40 | .883 954 866 | 884 
.30 | 1.100 1.175 1.066 | 1.104 
.20 | 1.383 1.453 1.319 | 1.386 
10 | 1.833 1.865 1.693 | - 1.844 
05 | 2.262 2.222 2.017 | 2.290 
02 | 2.821 | 2.638 2.394 2.896 
2 





t 





| Deming & Birge Hendricks 
approximation | approximation 


| 
| 
| 


127 
. 256 
.389 
.529 
. 680 
.849 
1.045 
1.293 
1.659 
1.977 
2.347 
2.598 














.127 
. 256 
389 
. 530 
.683 
.854 
1.055 
1.312 
1.700 
2.047 
2.466 
2.764 
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TABLE 4 


Ordinates of the normal curve with unit standard deviation and ordinates of the 


ee , 2'n! 
exact distribution functions of © (s — 5), (n — 3)#z, (n — 14)!z, and 


aa ay forn = 10 


Quiinete 4S of diatitaiion unation 





Deviation from . 
ag T € ain! 
| Beet | S60 | wae | w-soe 


tales Dy +H 





—2. | .0175 | .0085 | .0181 0254 0156 
—2. | .0540 | .0454 | .0459 0581 0544 
—1. 1295 | .1356 | .1092 | .1234 .1306 
—1. | .2420 | .2663 | .2256 | .2290 2426 
—. | .3521 |  .3751 .3692 | .3454 3522 

| .3989 | .3999 | .4400 3991 | .3990 
+ .5 | .3521 | .3348 | .3692 | .3454 | .3522 
+1. | 2420 2245 . 2256 2290 | .2426 
+1. 1295 | .1283 | .1092 1234 | .1306 
+2. 0540 | .0560 | .0459 0581 | .0544 
+2. | .0175 | .0213 | .0181 0254 | .0156 
0044 | = .0068 =. 0071 0108 | .0034 


| 
—3. 0044 |. 0006 | .0071 .0108 | .0034 
| 
| 
| 














TABLE 5 


Values of c, for large values of n 


Cn | n Cn 





99248 | 900 | 99917 
.99499 | 1000 99925 
99624 | 2000 | 99962 
99700 | 3000 | 99975 
99750 4000 .99981 
99786 | 5000 | 99985 
99812 | 10000 | .99992 
99833 | 20000 | 99996 
99850 | 30000 99997 
99875 | 40000 | 99998 
99893 | 50000 | .99998 
99906 100000 99999 
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Fig. 1. Exact DistRIBUTION oF — — (s — §) FoR n = 10 anp NorMAL CuRVE WITH 
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, exact distribution; » normal curve 
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, exact distribution; , normal curve 
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